Goto

Collaborating Authors

 atypical speech


Affect Models Have Weak Generalizability to Atypical Speech

arXiv.org Artificial Intelligence

Speech and voice conditions can alter the acoustic properties of speech, which could impact the performance of paralinguistic models for affect for people with atypical speech. We evaluate publicly available models for recognizing categorical and dimensional affect from speech on a dataset of atypical speech, comparing results to datasets of typical speech. We investigate three dimensions of speech atypicality: intelligibility, which is related to pronounciation; monopitch, which is related to prosody, and harshness, which is related to voice quality. We look at (1) distributional trends of categorical affect predictions within the dataset, (2) distributional comparisons of categorical affect predictions to similar datasets of typical speech, and (3) correlation strengths between text and speech predictions for spontaneous speech for valence and arousal. We find that the output of affect models is significantly impacted by the presence and degree of speech atypicalities. For instance, the percentage of speech predicted as sad is significantly higher for all types and grades of atypical speech when compared to similar typical speech datasets. In a preliminary investigation on improving robustness for atypical speech, we find that fine-tuning models on pseudo-labeled atypical speech data improves performance on atypical speech without impacting performance on typical speech. Our results emphasize the need for broader training and evaluation datasets for speech emotion models, and for modeling approaches that are robust to voice and speech differences.


Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect

arXiv.org Artificial Intelligence

Perceptual voice quality dimensions describe key characteristics of atypical speech and other speech modulations. Here we develop and evaluate voice quality models for seven voice and speech dimensions (intelligibility, imprecise consonants, harsh voice, naturalness, monoloudness, monopitch, and breathiness). Probes were trained on the public Speech Accessibility (SAP) project dataset with 11,184 samples from 434 speakers, using embeddings from frozen pre-trained models as features. We found that our probes had both strong performance and strong generalization across speech elicitation categories in the SAP dataset. We further validated zero-shot performance on additional datasets, encompassing unseen languages and tasks: Italian atypical speech, English atypical speech, and affective speech. The strong zero-shot performance and the interpretability of results across an array of evaluations suggests the utility of using voice quality dimensions in speaking style-related tasks.


Hypernetworks for Personalizing ASR to Atypical Speech

arXiv.org Artificial Intelligence

Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech. However, these approaches assume a priori knowledge of the atypical speech disorder being adapted for -- the diagnosis of which requires expert knowledge that is not always available. Even given this knowledge, data scarcity and high inter/intra-speaker variability further limit the effectiveness of traditional fine-tuning. To circumvent these challenges, we first identify the minimal set of model parameters required for ASR adaptation. Our analysis of each individual parameter's effect on adaptation performance allows us to reduce Word Error Rate (WER) by half while adapting 0.03% of all weights. Alleviating the need for cohort-specific models, we next propose the novel use of a meta-learned hypernetwork to generate highly individualized, utterance-level adaptations on-the-fly for a diverse set of atypical speech characteristics. Evaluating adaptation at the global, cohort and individual-level, we show that hypernetworks generalize better to out-of-distribution speakers, while maintaining an overall relative WER reduction of 75.2% using 0.1% of the full parameter budget.


Google made an app to ease communication for people with speech impairments

Engadget

For too long, people with speech impairments have struggled to be understood not only by other people, but also by voice-based technology. Though some companies have started to make their products work better for people with atypical speech, the most prevalent services still don't hear them well. Google announced today that it's made a new Android app called Project Relate that could help people with speech impairments communicate more easily with others and the Assistant. It's looking for beta testers to test and improve the app starting today. Like product manager for Google Research Julie Cattiau said in a video, "standard speech recognition doesn't always work as well for people with atypical speech because the algorithms have not been trained on samples of their speech."


AI Technologies that are Reshaping Social Infrastructure

#artificialintelligence

Together with the rise of the Internet, access to large repositories of data has helped machine learning technology grow exponentially. The incredibly quick pace of growth was unprecedented. As a result, it is obvious that AI will make a significant impact on the world in the years to come. However, with the numerous established and emerging fields of AI around today, such a blanket statement doesn't provide much concrete meaning. What fields and applications of AI are receiving the most investment and development?